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Abstract — Elaborating on prior work by Minka, we formu- 
late a general computation rule for lossy messages. An impor- 
tant special case (with many applications in communications) 
is the conversion of "soft-bit" messages to Gaussian messages. 
By this method, the performance of a Kalman equalizer is 
improved, both for uncoded and coded transmission. 

I. Introduction 

We consider message passing algorithms in factor graphs 
[1], [2]. If the factor graph has no cycles, the messages 
computed by the basic sum-product and max-product al- 
gorithms are exact summaries of the subgraph behind the 
corresponding edge. However, in many applications (espe- 
cially with continuous variables), complexity considerations 
suggest, or even dictate, the use of approximate or lossy 
summaries. For example, it is customary to use Gaussian 
messages even in cases where the "true" (sum-product or 
max-product) messages are not Gaussian, or to use scalar 
(i.e., single-variable) messages instead of multi-dimensional 
(i.e., multi-variable) messages. 

In this paper, we first formulate a general message update 
rule for lossy summaries/messages that is a nontrivial gener- 
alization of the standard sum-product or max-product rules. 
This rule was in essence proposed by Minka [3], [4], but our 
general formulation of it may not be obvious from Minka's 
work. 

We then focus on one particular application: the conver- 
sion of binary ("soft-bit") messages into Gaussian messages, 
which has many uses in communications. For our numerical 
examples, we then further focus on equalization: we give 
simulation results for an iterative Kalman equalizer both for 
a linear FIR (finite impulse response) channel and for a linear 
IIR (infinite impulse response) channel. For uncoded trans- 
mission, the new algorithm almost closes the gap between 
the BJCR algorithm and the LMMSE (linear minimum mean 
squared error) equalizer; for coded transmission, the new 
algorithm improves the performance of the iterative Kalman 
equalizer at very little additional cost. 

It should be noted that the new message computation rule 
yields iterative algorithms even for cycle-free graphs. We 
also note that some sort of damping is usually required to 
stabilize the algorithm. 
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Fig. 1. Lossy message ^t a 



along a general edge/variable X. 



In this paper, we will use Forney-style factor graphs as 
in [2] where edges represent variables and nodes represent 
factors. 

II. A General Computation Rule for Lossy 
Messages/Summaries 

Consider the messages along a general edge (variable) X 
in some factor graph as illustrated in Fig. ^ Let fi tme (x) 
be the "true" sum-product or max-product message which 
we want (or need) to replace by a message H appmx (x) in 
some prescribed family of functions (e.g., Gaussians). In 
such cases, most writers (including these authors) used to 
compute /U approx as some approximation of /i true . However, 
the semantics of factor graphs suggests another approach. 
Note that the factor graph of Fig. ^ represents the function 



(1) 



which the replacement of fj,^ by ^approx w iU change into 

/0) = ^approxfz)^)- (2) 

It is thus natural to first compute 

f(x) = some approximation of fj, tme (x) /i (x) (3) 

and then to compute /z approx from (|2). The approximation in 
(|3} must be chosen so that solving (|2) for /x approx yields a 
function in the prescribed family. 

Important special cases of this general approach (including 
the Gaussian case) were proposed as "expectation propaga- 
tion" in [3] and [4]. 

The choice of a suitable approximation in l|3} will, in 
general, depend on the application. For many applications, 
a natural approach (proposed and pursued by Minka) is to 
minimize the Kullback-Leibler divergence: 



/ = argmin D(f\\f) 

f in chosen family 



(4) 



In this paper, the approximate messages will always 
be Gaussian. However, other families of functions can be 
used. For example, multivariable messages with a prescribed 
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Fig. 2. Converting a soft-bit message fj, h into a Gaussian message 
or /T gM . 



Markov-chain structure were used in [6]; with hindsight, the 
update rule for such messages that was proposed in [6] is 
indeed an example of the general scheme described here. A 
related idea was proposed in [5]. 

III. Converting Soft-Bit Messages to Gaussian 
Messages 

We will now apply the general scheme of the previous 
section to the conversion of messages defined on the finite 
alphabet {+1,-1} into Gaussian messages. The setup is 
shown in Fig. |2] which is (a part of) a factor graph with 
an equality constraint between the real variable X and the 
{+1, — l}-valued variable Y. (The equality constraint node 
in Fig. |2] may formally be viewed as representing the factor 
8{x — y), which is to be understood as a Dirac delta in x 
and a Kronecker delta in y.) The messages /i b and fi b are 
defined on the finite alphabet {+1,-1} and the messages 
/i g , /i gs , and /i gM are Gaussians; /i gs denotes the standard 
Gaussian approximation and ^ gM denotes the alternative 
Gaussian approximation due to Minka, as will be detailed 
below. 

Let us first recall the conversion of Gaussian messages 
into soft-bit messages. Let m g and cr g be the mean and the 
variance, respectively, of The (lossless) conversion from 
/!„ to /j, b is an immediate and standard application of the 
sum-product (or max-product) rule [1], [2]: 



[ ^(+1) \ K [ M g (+1) \ . 
\ M b (-1) J \ jT g (-l) J ' 

in the standard logarithmic representation, this becomes 



(5) 



In 



Mb(-l) 5 



(6) 



We now turn to the more interesting lossy conversion of 
fj, h into a Gaussian. Let mt and cr b be the mean and the 
variance, respectively, of /i b , which are given by 



m b = 



Mbj+1) ~ Mb(-l) 
Mb(+1)+ Mb(-l) 
l-(m b ) 2 . 



(7) 
(8) 



The traditional approach forms the Gaussian message /i gs 
(with mean m gs and variance <r gs ) from the mean and the 
variance of /x b : 



m„ s — m b and a 



(9) 



The approach of Section [H] yields another Gaussian message 
MgM (with mean m g M and variance cr gM ) as follows. In 
Fig. |2j the true global function corresponding to Q is 

S(x - 1) /7 b (+l) M g (+1) + 6(x + 1) /7 b (-l) M g (-1) (10) 

which (when properly normalized) has mean 

m b + m b 



mtrue 



1 + rribirib 



and variance 



^tme = 1 - (mtrue)' 



(ID 



(12) 



where m b is the mean of /i b (|5}, which is formed as in 0. 
The approximate global function (corresponding to (0) is 
the Gaussian 



MgMOzOiWgO) 

with mean m g and variance at given by 

l/al = 1/2 + 1/5 



m g /(7 2 



mgivi/CgM + m g /cr g . 



(13) 



(14) 
(15) 



Now a natural choice for the approximation Q is to equate 
the mean and the variance of the Gaussian approximation 
with the corresponding moments of the true global function: 



mm 



and er g = n, 



i 

true ' 



(16) 



(As pointed out by Minka, this choice may be derived 
from 0.) The desired Gaussian message /i gM is thus 
obta ined by first evaluating (II It and (I12> and then computing 
CT gM and m gM from (fT4l and Q3) . 

Note that, in general, the message /i„ M is not trivial even 
if jU b is neutral (m b = and <r b = 1). 

IV. Issues 

A. Negative "Variance" 

Solving fl!4i for a^ M may result in a negative value for 
<r gM . (This indeed happens in the examples to be described 
in Section [V]) In such cases, M is a correction factor (not 
itself a probability mass function) that tries to compensate 
for an overly confident n g . The product (II 31 usually remains 
a valid probability mass function, up to a scale factor. 
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Fig. 3. Joint code/channel factor graph. 



B. Damping 

In our numerical experiments (Section [VJ, simply replac- 
ing the standard Gaussian message n„ s by n„ M yielded 
unstable algorithms. Good results were obtained, however, 
by geometric mixtures of the form 

MgW = (/^mO)) (/%(») (17) 

with < a < 1. The mean and the variance of the resulting 
Gaussian /x g are given by 



1/a 2 = a/a 2 m + (1 - a)/aL 



(18) 



and 



m„ = 



m gM a/a 2 



gM 



m gs (l -a)/a 



a /<r%M + (1 ~ oi)/al 



V. Application Example: Equalization 

Consider the transmission of binary ({+1, — l}-valued) 
symbols X\,..,, X n over a linear channel with transfer 
function H{z) — YltLo hiz -1 and additive white Gaussian 
noise Wi, . . . , W n - The received channel output symbols are 
Yx,...,Y n with 



Y k = J2heX k _ e + W k , 



(20) 



1=0 



where we assume X k — for k < 0. The binary symbols 
X k may or may not be coded. 

The joint code/channel factor graph is shown in Fig.[3]with 
channel-model details as in Fig. 0] (In the uncoded case, the 
code graph is missing.) The factor graph shown in Fig. 0] 
results from writing i20\ in state space form with suitable 
matrices A, B, and C, where B is a column vector and C 
is a row vector, cf. [2]. 

Equalization is achieved by forward-backward Gaussian 
message passing (i.e., Kalman smoothing) in the factor graph 
of Fig. [4] according to the recipes stated in [2]. (See [7] for 
a more detailed discussion.) 

In this paper, we are only concerned with the messages 
along the edges X k (towards the channel model) in Fig. [3] 
Using the standard messages l|9} results in an LMMSE 
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Fig. 4. Factor graph of the channel model (one section). 



equalizer; in the uncoded case, this algorithm terminates after 
a single forward-backward sweep since the factor graph of 
Fig. 0] has no cycles. However, using the (damped) Minka 
messages dl7t -(ll9ll results in an iterative algorithm even in 
the uncoded case. 

Simulation results for two different channels are given in 
Figures HJ{7] Figures [5] and [6] show the bit error rate vs. the 
signal-to-noise ratio (SNR) for an FIR channel with transfer 
(19) function H(z) = 0.227 + 0.46Z" 1 + 0.688z~ 2 + 0.46z~ 3 + 
0.227z~ 4 ; Fig.0shows the bit error rate vs. the SNR for an 
IIR channel with transfer function H(z) = 1/(1 — 0.9z -1 ). 
The FIR channel was used as an example in [8]; because 
this channel has a spectral null, the difference between a 
LMMSE equalizer and the optimal BCJR equalizer is large. 
The IIR channel was used as an example in [9]. 

Two different message update schedules are used: in 
Schedule A, the output messages (along edge X k out of 
the channel model) are initialized to "infinite" variance and 
are updated only after a complete forward-backward Kalman 
sweep; in Schedule B, these messages are updated (and 
immediately used for the corresponding incoming Minka 
message) both during the forward Kalman sweep and the 
backward Kalman sweep. From our simulations, Schedule B 
is clearly superior. 

It is obvious from Figures [5] and Q that, for uncoded 
transmission, the Minka messages provide a very marked im- 
provement over the standard messages (i.e., over the LMMSE 
equalizer). In Fig. |5] we almost achieve the performance of 
the BCJR (or Viterbi) equalizer (and we also outperform 
the decision-feedback equalizer [8, p. 643]). As for Fig. 
we almost achieve the performance of the quasi-Viterbi 
algorithm reported in [9]. 

For the coded example of Fig. |6] a rate 1/2 convolutional 
codes with constraint length 7 was used. In this case, the 
iterative Kalman equalizer does quite well already with the 
standard input messages (|9}, but the Minka messages do give 




- + - Minka, Schedule A, 1st Ite (LMMSE) 

- * - Minka, Schedule A, 2nd Ite 

- e - Minka, Schedule A, 120th Ite 
-□- Minka, Schedule B, 1st Ite 
—A— Minka, Schedule B, 2nd Ite 
-V- Minka, Schedule B, 120th Ite 

BCJR limit 



- + - Minka, Schedule A, 1st Ite (LMMSE) 

- * - Minka, Schedule A, 2nd Ite 

- © - Minka, Schedule A, 10th Ite 
-□- Minka, Schedule B, 1st Ite 
—A— Minka, Schedule B, 2nd Ite 
-V- Minka, Schedule B, 10th Ite 
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Fig. 5. Bit error rate vs. SNR for uncoded binary transmission over FIR 
channel with transfer function H(z) = 0.227 + OAdz' 1 + 0.688^~ 2 + 
0.462" 3 + 0.227z- 4 . 



Fig. 7. Bit error rate vs. SNR for uncoded binary transmission over IIR 
channel with transfer function H(z) = 1/(1 — 0.9.2 -1 ). 
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- e - LMMSE / Kalman, 10th Ite 
-□- Minka, Schedule B, 1st Ite 
-V- Minka, Schedule B, 10th Ite 

— A — BCJR, 1st Ite 
BCJR, 10th Ite 
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Bit error rate vs. SNR for coded binary transmission over FIR 



a further improvement at very small cost. 

A key issue with all these simulations is the choice of 
the damping/mixing factor a in (I17> -(I19>. The best results 
were obtained by changing a in every iteration. Typical good 
sequences of values of a are plotted in Fig. [8] We note the 
following observations: 

• The initial values of a are very small. 

• After a moderate number of iterations (typically 
10. ..20), the bit error rate stops decreasing. At this 
point, a is still very small. 

« Many more iterations with slowly increasing a are 
required to reach a fixed point with a = 1. 

• At such a fixed point with a = 1, the approximation 
holds everywhere. 
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Fig. 8. Good sequences for a vs. the iteration number k. 

VI. Conclusion 

Elaborating on Minka's work, we have formulated a 
general computation rule for lossy messages. An important 
special case is the conversion of "soft-bit" messages to 
Gaussian messages. In this case, the resulting Gaussian mes- 
sage is non-trivial even if the "soft-bit" message is neutral. 
By this method, the performance of a Kalman equalizer is 
significantly improved. 
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